Best Practice for Data-Driven Convolution Filter HLS Kernel

Convolution filter plays a very important role in FPGA application nowadays. It’s the cornerstone of many popular applications. In the field of image processing, the convolution filter kernel with different parameters gliding through the image can complete different image processing tasks, such as sharpening, blurring or edge detection.

In the field of neural network accelerator, no matter what network architecture or network layer is selected, convolution calculation is also the most important computing unit, but the filtering kernel becomes trained weight data. As a result, designing the convolution filtering kernel is like building the most basic and important block of a Lego toy.

Using abstract model as the basis, two types of task-level parallelism (TLP) models can be used to structure and design your application. TLP can be data-driven or control-driven, or can mix control-driven and data-driven tasks in a single design. The main differences between these two models are:

In this section, we will understand the best practices for writing data-driven application on the FPGA device by taking convolution filter as an example.

Part Topic Description Environment
1 Software Implementation Teaching Case: Simple Convolution Filter Without Considering Boundary Conditions Jupyter Notebook
Industrial Case: Extensible Universal Convolution Filter Kernel
2 HLS Kernel Programming Determine the Design Specifications AMD Vitis HLS 2023.2
TLP: Partition the Code into a Load-Compute-Store Pattern
TLP: Partition the Compute Blocks into Smaller Functions
TLP: Connect the Load, Compute, and Store Functions
DLP: Scaling/Unroll - Determine the Unroll factor
DLP: Enable Pipelining with II = 1
DLP: Maximize Memory efficiency
3 System-level Integration Create the kernel Graph and the test bench Jupyter Notebook
Load the overlay and run the application on the PYNQ framework
Visualize the results and analyze the performance

Copyright© 2024 Advanced Micro Devices